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Abstract. Given a formal language L specified in various ways, we 
consider the problem of determining if L is nonempty. If L is indeed 
nonempty, we find upper and lower bounds on the length of the shortest 
string in L. 



1 Introduction 

Given a formal language L specified in some finite way, a common problem is to 
determine whether L is nonempty. And if L is indeed nonempty, then another 
common problem is to determine good upper and lower bounds on the length of 
the shortest string in L, which we write as Iss(L). Such bounds can be useful, 
for example, in estimating the state complexity of L, since Iss(L) < sc(L). 

As an example, we start with a very simple result often stated in introductory 
classes on formal language theory. 

Proposition 1. Let L be accepted by an NFA M with n states and t transitions. 
Then we can decide in time 0(n + t) whether L =/: If L is nonempty, then 
Iss(i) < n. Further, this bound is tight. 

We now turn to a more challenging example. Here L is specified as the com- 
plement of a language accepted by an NFA. 

Theorem 1. Let L be accepted by an NFA with n states. Then it is FSPACE- 
complete to determine whether L 7^ 0. If L ^ 0, then Iss(L) < 2". Further, for 
some constant c, < c < 1, there is an infinite family of examples with n states 
such that lss( L ) > 2^ 



-)cn 



Proof. For the PSPACE-completeness, see [T]. 

The upper bound is easy and follows from the subset construction. The lower 
bound is significantly harder; see [S]. □ 

These two examples set the theme of the paper. We examine several problems 
about shortest strings in regular languages and prove bounds for Iss(L). Some 
of the results have appeared in the master's thesis of the second author |3j. 



2 The first problem 



Recall the following classical result about intersections of regular languages. 

Proposition 2. Let Li (resp., L2) he accepted by an NFA with, si states and ti 
transitions (resp., S2 states and t2 transitions) Then Li fl L2 is accepted by an 
NFA with S1S2 states and tit2 transitions. 

Proof. Use the usual direct product construction. □ 

This suggests the following natural problems. Given NFA's Mi and M2 as 
above, decide if L{Mi) n L{M2) ^ 0. This can clearly be done in 0{siS2+tit2) 
time, by using the direct product construction followed by breadth-first or depth- 
first search. 

Now assume L{Mi) n L{M2) ^ 0. What is a good bound on lss(L(Mi) n 
L(Af2))? Combining Propositions [T] and [U we immediately get the upper bound 
lss(L(Mi) n L{M2)) < siS2. 

However, is this bound tight? For gcd(si,S2) = 1 an obvious construc- 
tion shows it is, even in the unary case: choose Li = a*^^^(a'*^)* and L2 = 
a'*^~^(a'*^)*. However, this idea no longer works for gcd(si,S2) > 1- Neverthe- 
less, the bound S1S2 — 1 is tight for binary and larger alphabets, as the following 
result shows. 

Theorem 2. For all integers m,n > I there exist DFAs Mi,M2 with m and 
n states, respectively, and with \S\ — 2 such that L{Mi) D L{M2) 7^ 0, and 
lss(L(Mi) n LiAh)) ^mn~l. 

Proof. The proof is constructive. Without loss of generality, assume m < n, 
and set S — {0,1}. Let Mi be the DFA given by {Qi,E,5i,po,Fi), where 
Qi = {Po,Pi,P2, • • ■ ,Pm-i}, Fl = po, and for each a, < a < m - 1, and 
c e {0, 1} we set 



Let M2 be the DFA {Q2,S,52,qo,F2), shown in Figure [U where Q2 = 
{qo,qi, . . . ,g„-i}, F2 = g„_i, and for each a, < a < n - 1, 



h{Pa,c) ^P(^a+c) mod 



tn ■ 



Then 



L{Mi) ^{xeU* : \x\i EE (mod m)}. 




if < a < m - 1; 

if c = and m — 1 < a < n — 1; 

if c = 1 and m— l<a<n — 1. 








Fig. 1. The DFA Mj. 

Focusing solely on the I's that appear in some accepting computation in M2, 

we see that we can return to qo 

(a) via a simple path with m I's, or 

(b) (if we go through Qn-i), via a simple path with (m — 1) I's and ending in 
the transition S{qn-i,0) = go- 
After some number of cycles through qo, we eventually arrive at qn-i- Letting i 
denote the number of times a path of type (b) is chosen (including the last path 
that arrives at qn-i) and j denote the number of times a path of type (a) is 
chosen, we see that the number of I's in any accepted word must be of the form 
i{m — 1) + jm, with i > 0, j > 0. The number of O's along such a path is then 
at least «(n — m + l) — 1, with the —1 in this expression arising from the fact 
that the last part of the path terminates at without taking an additional 
transition back to go- 

Thus 

L{M2) C{xgS*: 3i,j G N, such that i > 0, j > 0, and 
|a;|i = i{m — 1) + jm, \x\o > i{n — m + 1) — 1}. 

Furthermore, for every i,j € N, such that i > 0,j > 0, there exists an x e 
L{M2) such that |x|i = i{m — 1) + jm, and |a;|o = i{n — m + 1) — 1. This is 
obtained, for example, by cycling j times from qo to q,n-i and then back to qo 
via a transition on 1, then j — 1 times from qo to q-n-i and then back to go via 
a transition on 0, and finally one more time from go to qn-i- 
It follows then that 

L(Mi) n L{M2) C{xe S* : 3i,j G N, such that i > 0, j > 0, and 
\x\i = i{m — 1) + jm, |a;|o > «(n — m + 1) — 1 
and i{m — 1) + jm = (mod m)}. 



Further, for every such i and j, there exists a corresponding element in L{Mi n 
M2). Since m — 1 and m are relatively prime, the shortest such word corresponds 
to i — m, j = 0, and satisfies |a:|o = m{n — m + 1) — 1. In particular, a shortest 
accepted word is (i™-io"-"+i)™-il'"-iO""", which is of length mn-1. □ 

We can also obtain a bound for the unary case. Let 



as defined in [7j. 

Theorem 3. Given unary DFA's Mi (resp., M2) with m (resp., n) states, ac- 
cepting Li (resp., L2), we have lss(Li n L2) < F{m,n) — 1. Furthermore, for 
all m,n > 1 there exist unary DFA 's of ni and n states achieving this bound. 

Proof. Follows from [7]. 

3 The second problem 

Recall the Post correspondence problem: we are given two finite nonempty lan- 
guages A = {xi,X2j ■ • ■ , Xn} and B = {yi, ?/2j ■ • ■ j Vn}, and we want to deter- 
mine if there exist r > 1 and a finite sequence of indices ii,i2, . ■ . ,ir such that 
a^ii ■ • • = yi-^ . . . yi^. As is well-known, this problem is undecidable. 

Levent Alpoge [2] asked about the variant where we throw away the "corre- 
spondence" : determine if there exist r, s > 1 and two finite sequences of indices 
ii , . . . , V and ji , . . . , js such that Xi-^ ■ ■ ■ Xi^ = j/j^ • • • yj^ . In other words, we want 
to decide if n B+ 

This variant is, of course, decidable. In fact, even a more general version is 
decidable, where the languages need not be finite. 

Proposition 3. Suppose A is a language accepted by an NFA Mi with si states 
and ti transitions, and B is accepted by an NFA M2 with S2 states and t2 tran- 
sitions. Then we can decide in 0{siS2 + tit2) time whether A'^ n B^ 7^ 0. 

Proof. Given NFA Mi = {Qi, S, Si, qi, Fi) accepting A, we can create an NFA-e 
M[ = {Qi, S, 5'^, QijPi) accepting yl+ by adding an e-transition from every final 
state of Ml back to qq. We can apply a similar construction to create = 
{Q2, S,S'2,q2,F2) accepting B+. Then we can create an NFA-e M accepting 
n i?+ using the usual direct product construction. Since this construction 
is crucial to what follows, and since there is one subtle point, we describe it in 
some detail. 

Given M[ = {Qi,S,S'i,qi,F{) and M^ ^ {Q2, S,S!2,q2,F^) as above, M = 
{Q, S, 5, qo, F), where Q = Q1XQ2, qo — [91, 92], and F — F1XF2. The transition 
function 5 is defined as follows: 

Forp g Qi, q e Q2, and a e ^U{e} we have [p', q'] G S{[p, q],a) if p' G S'i{p, a) 
and q' G 62 {q, a) . These transitions correspond to the usual direct product edges 
of the transition diagram. 




However, we also need edges in which one machine performs an exphcit e- 
transition, and the other machine performs an imphcit e-transition by simply 
staying in its own state. This corresponds to including the transitions [p',?'] € 
S{[p, q],e) if p' e S[{p, e) and q — q' or ii p' — p and q' E S'2{q, e). 

This construction results in an NFA-e accepting yl+ n -B+ and having at 
most tit2 + 2siS2 transitions. 

Now we can use the usual breadth-first or depth-first search to solve the 
emptiness problem. □ 

Corollary 1. Given NFA's Mi accepting Li (resp., M2 accepting L2) of m 
(resp., n) states, the shortest string in Lf H is of length at most mn ~ 1. 

Suppose m> n>l. Then there exists Mi accepting Li (resp., M2 accepting 
L2) of m (resp., n) states such that the shortest string in H is of length 
> (m — l)n. 

Proof. The first assertion follows from Proposition |31 

For the second assertion, we can take Mi and M2 as in the proof of Theo- 
rem [51 Clearly Li = . When we apply our construction to M2 to create LJ, 
we add an e-transition from q„_i back to qo. The effect is to allow one less in 
each cycle through the states. As in the proof of Theorem [51 to get the proper 
number of I's, we must have i = m, and hence the shortest string in H 
is of length (m — \)n. □ 

We can improve the upper bound to mn — 2 as follows: 

Theorem 4. For any m-state DFA Mi and n-state DFA M2 such that L(Mi)+n 
L(M2)+ we have lss(L(A/i)+ n L{M2)+) <mn-l. 

Proof. Assume, contrary to what we want to prove, that we have DFAs Mi and 
M2 with m and n states, respectively, such that lss(L(Mi)+nL(M2)^) = mn—1. 
Let Ml be the DFA given by (Qi, T, (5i,po, -Fi), where Qi = {po,Pi,P2, ■ • ■ ,Pm-i}, 
and let M2 be the DFA given by (Q2, Z", Po, -F2), where Q2 = {go,9i, 92, • ■ • ,qn^i}- 
Then let M[ and M2 be the e-NFAs obtained by adding e-transitions from the 
final states to the start states in Mi and M2, respectively. Let M be the e-NFA 
obtained by applying the cross-product construction to M[ and Then M 
accepts L{Mi)+ n L(M2)+. 

If M has more than one final state, a shortest accepting path would only visit 
one of them, and this immediately gives a contradiction. So, assume each of Mi 
and M2 have only one final state; that is Fi = {p^ G Qi} and F2 — {qy S Q2}- 
Then M = (Qi x Q2, S,6,[po,qo],[px,qy]), where for all pi G Qi,qj € Q2,a € 
E,S{\pi,qj],a) = [di{pi, a), S2{qj , a)]. Note that M has e-transitions from [px^qj] 
to [po,qj] for all qj € Q2 and [pi,qy] to [pi,qo] for all Pi € Qi. 

Let wi be a shortest word accepted by Mi and UJ2 be a shortest word accepted 
by M2. Then d{[po, qo],wi) = [px,qi] for some i such that qi & Q2, and while car- 
rying out this computation we never pass through two states [pa, qt] and [pc, qd] 
such that a — c. Likewise, 6([po, qo],W2) = [Pj, %] for some j such that pj € Qi, 



and while carrying out this computation we never pass through two states [pa, Qb] 
and [pc, Qd] such that 6 = d. If both a; = and y = the shortest accepted string 
is e, so without loss of generality, assume x 7^ 0. Then S{[po, go], wi) — [px, qo] or 
else we can visit \wi \ +2 states with \wi\ symbols by using an e-transition and 
we get a contradiction. If y = 0, wi is the shortest string accepted by M and 
we have a contradiction. So, y ^ and S{[po,qo],W2) = [po,qy]- It follows that 
reading wi from the initial state brings us to [pa;,9o] without passing through 
[Pq, Qy]^ and reading W2 from the initial state brings us to [po, Qy] without passing 
through [px, qo\. So, a shortest accepting path need only visit one of [p^, qo] and 
[po,%], and again we have a contradiction. □ 

We do not know an exact bound for this problem. However, for the unary 
case, we can obtain an exact bound based on a function G introduced in [7]. 
Define G{m,n) = maxi<i<m lcm(i,j), and define the variant 

G'{fn^7i)— max lcm(i,j). 

1<3<71 

Then G'{m,n) = max(G(m — l,n),G(m, n — 1)). The function G is a very 
difficult one to estimate, although deep results in analytic number theory give 
some upper and lower bounds [7]. 

Theorem 5. If Mi (resp., M2) is a unary NFA with m states (resp., n states) 
and Li = L{Mi) (resp., L2 = LiAh)), then lss(i+ n L^) < G'{m,n). Further- 
more, for all m,n > 1 there exist unary DFA 's of m and n states, respectively, 
achieving this bound. 

Proof. Assume the input alphabet of both AIi and M2 is = {a}. Let ci (resp., 
C2) be the length of the shortest nonempty string in Li (resp., L2). Clearly 
Ci < m and C2 < n. Furthermore, if ci = m, then Li = (a™)*, and similarly if 
C2 ^ n then L2 = (a")*. Hence if (ci,C2) = {m,n), then e G Lf D L^, and 
hence lss(L+l n L^) = < G'{m,n). Otherwise either ci < m or C2 < n. 
Without loss of generality, assume C2 < n. Then Qricm(ci,c2) ^ ^+ p ]^+ ^ 
lss(Lf n L+) < lcm(ci, C2) < G(m, n - 1) < G'(m, n). 

Now suppose we are given m and n. Let i,j be the integers maximizing 
lcm(z,j) over 1 < i < rn, 1 < j < n with {i,j) ^ {m,n). If j < m, choose 
Li — (a*)+, which can be accepted by a DFA with i + 1 < m states, and choose 
L2 = (a-')*, which can be accepted by a DFA with j < n states. Otherwise, 
reverse the roles of m and n. Thus we get DFA's of m and n states, respectively, 
achieving lss(i]'" n ij) = G'(m, n). □ 



4 The third problem 

Another variation on the Post correspondence problem, also proposed by Alpoge 
[2], is more interesting. Here we throw away only part of the "correspondence": 



given A — {xi, X2, • ■ • , Xn} and B — {yi, y2j • • • , Vn}, we want to decide if there 
exist r > 1 and two finite sequences of indices zi, Z2, . . . , v and ji, j2, • • ■ , jr such 
that • • • Xi^ = Uj-^ . . . Uj^. In other words, we only demand that the number of 
words on each side be the same. 

This case is also efficiently decidable, even when A and B are possibly infinite 
regular languages. 

Theorem 6. Let Mi (resp., M2) be an NFA with si states and ti transitions 
(resp., S2 states and t2 transitions). We can decide in polynomial time (in 
si, S2,ti,t2) whether there exists k such that L(Mi)'^ H L{M2)^ 7^ 0. 

Proof. First, we prove the (possibly surprising?) result that 
L = y {L{A'hf n L{M2t) 

k>l 

is a context-free language. 

We construct a pushdown automaton M accepting L. On input x, our PDA 
attempts to construct two same- length factorizations of x: one into elements of 
L(Mi), and one into elements of L{M2). To ensure the factorizations are really 
of the same length, we use the stack of the PDA to maintain a counter that 
records the absolute value of the difference between the number of factors in the 
first factorization and the number of factors in the second. The appropriate sign 
of the difference is maintained in the state of the PDA. 

As we read a:, we simulate the NFA's Mi and M2. If we reach a final state in 
either machine, then we have the option (nondeterministically) to deem this the 
end of a factor in the appropriate factorization, and update the stack accordingly, 
or continue with the simulation. We accept if the stack records a difference of 

— that is, if the stack contains no counters and only the initial stack symbol Zq 

— and we are in a final state in both machines (indicating that the factorization 
is complete into elements of both Li and L2). 

Thus we have shown that L is context-free. Furthermore, our PDA has 
0{siS2) states and 0(^1^2) transitions. It uses only two distinct stack symbols — 
the counter and the initial stack symbol — and never pushes more than one ad- 
ditional symbol on the stack in any transition. Such a PDA can be converted to a 
context-free grammar G, using the standard "triple construction" [6] Thm. 5.4], 
using 0{s\s^) states and 0{s\s\tit2) transitions. Now we can test the emptiness 
of the language generated by a context-free grammar of size t in 0{t) time, by 
removing useless symbols and seeing if any productions remain [6, Thm. 4.2]. 

We conclude that it is decidable in polynomial time whether there exists k 
such that L{Mi)^ n L{M2f 7^ 0. □ 

Remark 1. There exist simple examples where L — lj^>i [L{Mi)^ n L{Al2)'') 
is not regular. For example, take L{Mi) — b*ab* and l{M2) = a*ba*. Then 
L = {x G {a, b}* : \x\a = \x\b > 1}, the language of nonempty strings with the 
same number of a's and 6's. 



Furthermore, if Mi, M2, M3 are all NFA's, then the analogous language 



L = U (L(Mi)'^- n L{M2)^ n L(Af3)'=) 

k>l 

need not be context-free. A counterexample is given by taking L{AIi) — {b, c}*a{b, c}*, 
L{M2) = {a,c}*b{a,c}*, and ^(Ma) = {a,b}*c{a,b}* . Then 

L = {x e {a,b,c}* : |x|a = |a;|6 = |a;|c > 1}, 

which is clearly not context-free. 

Remark 2. Mike Domaratzki (personal communication) observes that the de- 
cision problem "given Mi, M2, does there exist fc > 1 such that L{Mi)^ fl 
L{M2)^ 7^ 0" becomes undecidable if Mi and M2 are pushdown automata, by 
reduction from the problem "given CFG's Gi,G2, is L{Gi) n L(G2) ^ 0" 
[6l Theorem 8.10]. Given Gi and 6*2, we can easily create PDA's accepting 
Li := i(Gi)# and L2 := £(G2)#, where # is a new symbol not in the al- 
phabet of either Gi or G2. Then L\ n L\ ^ % for some fc > 1 if and only if 
L(Gi) n L{G2) ^ 0. A similar result holds for the linear context-free languages 

a- 

We now turn to the question of, given regular languages A and B, determining 
the shortest string in L = IJ^.>i {A'^ n S'^) , given that it is nonempty. Actually, 
we consider a more general problem, where we intersect more than two languages. 
We start by proving a result about directed graphs. 

Lemma 1. Suppose G — {V,E) is a directed graph with edge weights in , 
where the components 0^ the edge weights are all bounded in absolute value by K. 
Leta{p) denote the weight of a path p, obtained by summing the weights of all as- 
sociated edges. IfG contains a cycle C : u ^ u such that cr{G) = = (0, 0, . . . , 0), 
then G also contains a cycle G' : u u with <j{G') = = (0, 0, . . . , 0) and length 
at most \V\'^+^K'^d'^/^{\V\^ + d). 

Proof. For each vertex v in the cycle G, break G at the first occurrence of v. 
This gives us 

G = F1P2P3 ---Pk 

such that Pi : -yi V2, P2 : ^2 -> ws, • • • , Pk ■ Vk Wfc+i where {vi, . . . , Vk} is the 
set of vertices visited by G. The final vertex, Vk+i, is the same as vi because G 
is a cycle. Notice that fc < |F| because each vertex appears at most once in the 
list wi, . . . ,7;^;. 

For each Pi: Vi — Wi+i, generate a new path Pi: Vi — ?> Vi+i by removing 
all simple subcycles. The length of Pi is at most \V\; otherwise some vertex is 
repeated, so we have not removed all subcycles. Recombine the Pi's into a cycle 
T = Pi ■ ■ • Pfc having length \T\ < \V\k < \V\'^. In addition to T, we have a list 
of simple subcycles Pi, . . . , P^ that we removed while generating the Pi's. 



Consider the cycles we can construct using T, Bi, B2, ■ ■ ■ , Bg. For any B,, we 
know T visits the starting vertex of Bi because T visits all the vertices in C. 
Therefore we can splice Bi into T at its starting vertex. Since Bi is a cycle, we 
can insert it into T any positive number of times. We can also append T to the 
whole cycle as many times as we like. These techniques allow us to construct a 
cycle with weight 

ta{T) + bia{Bi) + ---+bia{Be) 

where t > 1 and 61, . . . , &„ > are all integers. 

Recall that T,Bi, . . . ,Bi were constructed by decomposing C. Each edge 
from C exists somewhere in T, Bi, . . . , B^, so we have 

= a{C) = a{T) + + • • • + a{Bt). 

This shows that it is possible to write as an integer linear combination of 

(7(T), (7(i?i), . . . , <T{Bi). Unfortunately, for each nonzero bi wc have at least one 
copy of Bi, with length at most \V\. Since all the bi's are nonzero and £ is 
unbounded, the corresponding cycle has unbounded length. If we hope to find a 
bounded cycle by this technique then we need to bound the number of nonzero 
bi's. Let us approach the problem with linear programming. Construct a matrix 
A e W^^^ where the ith column is given by A(') = a{Bi). Let 6 e M** be the 
column vector cr{T). We are looking for solutions to the problem 

Ax = b, x>0, xe R^. 

This is just the feasible set of a linear program in standard equality form. We 
saw earlier that it has the feasible solution a; = (l 1 • • • 1 l) . Note that if A is 
not full rank then we remove linearly dependent rows until we have a full rank 
matrix, and proceed with a matrix of rank d' < d. 

Linear programming theory tells us a feasible problem of this form has a basic 
feasible solution x* with at most d nonzero entries. Without loss of generality 
(relabelling if necessary), take all but the first d entries of x* to be zero. Letting A 
be the first d columns of A, the basic solution x* satisfies the following equation: 

A 

a{Bi)xt + ---+a{Bd)x*a = -a{T). 

We are not done yet because the a;*s are real numbers and we need an integer 
linear combination. Cramer's rule gives an explicit solution for each coefficient, 
X* — — I where Ai is the matrix A with the ith column replaced 

« dot (A) I del, (A) I ' 

by b. Note that A and Ai are integer matrices, so their determinants are integers 
and x* is a rational number. When we multiply through by |det(A)|, all the 
coefficients will be positive integers: 

a{Bi)\ det(ii)| + • • • + a{Bd)\ det(id)| + <t(T)| det(i)| = 0. 



We can bound the determinants with Hadamard's inequahty, which says that 
the determinant of a matrix M is bounded by the product of the norms of its 
columns. Each Bi is a simple cycle, so \Bi\ < \V\. It follows that any entry 
of a{Bi) is at most \V\K, so ||cr(Bi)|| < \V\K\/d. On the other hand, T has 
length at most \V\'^, giving ||cr(T)|| < \V\^K\/d. Combining these estimates 
gives |det(ii)| < \V\'^K'^d'^^^ for ah i and |det(i)| < \V\''+^K'^d'^/'^ . Now we 
construct the cycle C" from this linear combination, with |det(A)| copies of T 
and I det(Ai)| copies of each Bi. By construction, C has weight and its length 
is bounded as follows: 

d 

\C'\ = |detA||r| + ^|detA,||B,| 

d 

□ 

Corollary 2. Consider a generalization of the third problem to d languages 
Li, L2, ■ ■ ■ , Ld accepted by NFA 's having si, . . . , Sd states, respectively. If 

\J{Ll n • • • n i^) 

fc>i 

is nonempty, then the shortest string in the language has length bounded by 

0{s'^{d-lY'^-^^/\s^ +d-l)), 
where s (si + l)(s2 + 1) . . . (s„ + 1). 

Proof. We discuss the case d — 2, and then briefly indicate how this is generalized 
to the general case. 

First we discuss an automaton Mk = {Qk, S-, SktQk , Fk) accepting K = 
A* n B* which is a slight variant of the construction given in the proof of 
Theorem 131 above. 

Suppose we are given a regular language A (resp., B) accepted by an NFA Mi 
(resp., M2). Without loss of generality, we will assume that Mi (resp., M2) has no 
transitions into its initial state. This can be accomplished, if necessary, by adding 
one new state with transitions out the same as the transitions out of the initial 
state, and redirecting any transitions into the initial state to the new state. If 
the original machine had s states, then the new machine has at most s + 1 states. 
Call these new machines M[ = {Qi,S, 6i,qi,Fi) and = (Q2, S2, 92, ^2)- 

Next we create an NFA-e M" = (Qi, S[,qi,F{) by adding an e-transition 
from every final state of M[ back to its initial state, and by changing the set of 
final states to be F{ = {qi}. This new machine M" accepts A* . We carry out a 
similar construction on M2 obtaining accepting B*. 



Finally, mimicking the construction of Theorem [3] we create an NFA-e Mk 
accepting K = A* B* using the direct product construction outlined above on 
M" and M2'. Note that Mk has at most (si + l)(s2 + 1) states and has exactly 
one accepting state, which is its initial state. 

We define the edge weights of Mk to be Z as follows. An explicit e-transition 
in M[ or marks the end of a word, so each explicit e-transition taken in M[ 
back to the start gets weight +1, while each explicit e-transition in M2 back to 
the start gets weight —1. In this way we keep track of the difference between the 
number of factors used in L{Ai[) and L{M2). 

For the general case, we form the intersection automaton as before, and define 
the i'th coordinate of cr{P), for 1 < i < d, to be the difference in the number of 
e-transitions taken in M[ and M^^i. Now just apply Lemma [1] to get the desired 
bound. □ 

When d — 2, we can improve on the result of the previous lemma: 

Theorem 7. If d — 2, then the length of the cycle C in Lemma\^is at most 
2K\V\^. 

Proof. Remove simple cycles Bi, B2, ■ ■ ■ , Be from C until are we left with R, 
which has no proper subcycles. It follows that R must be a simple cycle, so we 
have decomposed C into simple subcycles. Note that the weight of C is the sum 
of the weights of all the i?i's and R. 

If R has weight then take C" = R. We are done because R has length at 
most \V\ < 2K\V\'^ . If R has nonzero weight then the positive and negative cases 
are identical so take R to have positive weight without loss of generality. Then 
there must be some Bi with negative weight, otherwise the sum of the weights of 
the BiS and R would be positive, but C has weight 0. Call the negative weight 
cycle S. 

If R and S have some vertex in common, then we can splice cr{R) copies of 
S into —<y{S) copies of R to get a cycle C of weight 0. Since cr(i?) < K\R\ and 
<y{S) < K\S\, the cycle has length \a{R)\\S\ + \a{S)\\R\ < 2i^|i?||S'| < 2K\V\'^. 

Otherwise, R and 5* have no vertex in common so we need to find some way 
to get from R to S and back again. Clearly C passes through every vertex in 
R and S, but we want a shorter cycle. Let T be the shortest cycle that passes 
through some vertex in R and some vertex in S. We will split T into a, the piece 
from R to S, and /?, the piece from S to R. 

We know that R, S are simple, and a, /? must be simple or we could make a 
shorter cycle T by making them shorter. Therefore, any vertex in V occurs at 
most four times in i?, S and T, once for each of R, S, a, (3. But R and S have no 
vertices in common, so each vertex occurs at most three times in R, S and T. 

Now if some vertex v occurs three times in R, S and T, then it must be in 
a, P and either R or S (without loss of generality, let it be in R). Then we can 
remove a prefix of a up to w, producing a. Similarly, remove a suffix of (3 starting 
from V, giving /3. Then d/? is a shorter cycle that visits v £ R and still visits S, 
contradicting the minimality of T. Therefore any vertex v occurs at most twice 
in R, S and T, so \R\ + \S\ + \T\ < 2\V\. 



Let us combine T with R if T has positive weight and S" if T has negative 
weight to produce a cycle Y . Either R or S is left over, call it X . Note that X 
and Y have opposite sign weights, and also have a vertex in common. As before, 
we combine |cr(X)| copies of Y with |o'(i^)| copies of X to produce a cycle C" of 
length at most 2ii:|X| Under the constraint |X| + |r| = \R\ + \S\ + \T\ <2\V\, 
the length 2_ftr|X||y| is maximized when \X\ = \Y\ = \V\, with maximum value 
2K\V\'^ ^ completing the proof. □ 

Finally, we prove an improvement for the unary case. 

Proposition 4. Let A, B he nonempty finite languages over a unary alphabet, 
say A = {a™S . . . , a"-} and B = {a"i, . . . , a"=}. Then A'' n B^ for some 
^ 1 iff i^^^i<i<r fni maxKjXs nj and mini<j<snj < maxi<i<rTOi. // both 
conditions hold, then A'' n B'^ ^ for some k < max(mi, . . . , m^, ni, . . . ,ns), 
and this bound is tight. 

Proof. Suppose mini<i<r > maxi<j<snj . Then every element of A'' will 
be of length greater than every element of B'^. Similarly, if mini<j<s < 
maxi<,;<rmi, then every element of B'^ will be of length greater than every 
element of A''. Hence if either condition holds, we have A'^ n B^ = for all 
A: > 1. 

Now suppose mini<i<r rui < maxi<j<s Uj and mini<j<5 nj < maxi<i<r m^. 
Then there exist a', a" € A and a™ G B such that I < m < n. Choose i = n — m 
and j = m-l. Then A^+i contains (aO'(a")^' = a'*+"^' = a'"-''"+"™-"' = 
^m(n-i)_ ^j^j Qi+j contains (a™)*+J = a"'""'). So for k^i+j we get A^ C^ B^ ^ 
0. Now i — j — n~l<n< max(TOi, . . . , TOi-, ni, . . . , n^). 

The bound is tight, as can be seen by taking A — {a, a"} and B = {a"~^}. 
Then the least k such that A'' n B*" ^ is k ^ n - 1. □ 
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